Zhou 09/768686 p age 1 



=> fil capl; d que 19; fil PASCAL, JICST-EPLUS, INSPEC, LIFESCI, BIOSIS, ANABSTR, 

SCISEARCH; d que 166; fil wpids; d que 13 

FILE 'CAPLUS' ENTERED AT 16:02:11 ON 22 OCT 2003 

USE IS SUBJECT TO' THE TERMS OF YOUR STN CUSTOMER AGREEMENT. 

PLEASE SEE "HELP USAGETERMS" FOR DETAILS . 

COPYRIGHT (C) 2003 AMERICAN CHEMICAL SOCIETY (ACS) 



Copyright of the articles to which records in this database refer is 
held by the publishers listed in the PUBLISHER (PB) field (available 
for records published or updated in Chemical Abstracts after December 
26, 1996), unless otherwise indicated in the original publications. 
The CA Lexicon is the copyrighted intellectual property of the 
American Chemical Society and is provided to assist you in searching 
databases on STN. Any dissemination, distribution, copying, or storing 
of this information, without the prior written consent of CAS, is 
strictly prohibited. 

FILE COVERS 1907 - 22 Oct 2003 VOL 139 ISS 17 
FILE LAST UPDATED: 21 Oct 2003 ( 20031021/ED) 

This file contains CAS Registry Numbers for easy and accurate 
substance identification. 



LI 
L6 
L7 
L8 
L9 



24 SEA FILE=CAPLUS ABB=ON 
4 0840 SEA FILE=CAPLUS ABB-ON 
83251 SEA FILE-CAPLUS ABB=ON 
71908 SEA FILE-CAPLUS ABB=ON 
0 SEA FILE=CAPLUS ABB=ON 



BUSA W?/AU 
DATABASE # 
INFER? 
ALGORITHM* 

LI AND (L6 OR L7 OR L8) 



FILE ' PASCAL 1 ENTERED AT 16:02:11 ON 22 OCT 2003 

Any reproduction or dissemination in part or in full, 

by means of any process and on any support whatsoever 

is prohibited without the prior written agreement of INIST-CNRS. 

COPYRIGHT (C) 2003 INIST-CNRS. All rights reserved. 

FILE 1 JICST-EPLUS 1 ENTERED AT 16:02:11 ON 22 OCT 2003 

COPYRIGHT (C) 2003 Japan Science and Technology Corporation (JST) 

FILE 'INSPEC ENTERED AT 16:02:11 ON 22 OCT 2003 

Compiled and produced by the IEE in association with FIZ KARLSRUHE 
COPYRIGHT 2003 (c) INSTITUTION OF ELECTRICAL ENGINEERS (IEE) 

FILE ' LIFESCI f ENTERED AT 16:02:11 ON 22 OCT 2003 
COPYRIGHT (C) 2003 Cambridge Scientific Abstracts (CSA) 



FILE 1 BIOSIS 1 ENTERED AT 16:02:11 ON 22 OCT 2003 
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L3 ANSWER 1 OF 3 WPIDS COPYRIGHT 2003 THOMSON DERWENT on STN 
ACCESSION NUMBER: 2002-454466 [48] WPIDS 

DOC. NO. CPI: C2002-129179 

TITLE: Quantifying target gene expression in living cells that 

possess a target gene of interest tagged with the binding 
site for an RNA binding protein and f luorescently labeled 
RNA binding polypeptide including an RNA binding domain. 

DERWENT CLASS: B0 4 D16 

INVENTOR (S) : BUSA, W B 

PATENT ASSIGNEE(S): (CELL-N) CELLOMICS INC; (BUSA-I) BUSA W B 
COUNTRY COUNT: 95 
PATENT INFORMATION: 
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APPLICATION DETAILS: 



PATENT NO 



KIND 



APPLICATION 



DATE 



WO 2002027031 A2 

AU 2001094872 A 

US 2003096243 Al Provisional 



WO 2001-US30438 
AU 2001-94872 
US 2000-236407P 
US 2001-965876 



20010928 
20010928 
20000928 
20010928 



FILING DETAILS: 



PATENT NO 



KIND 



PATENT NO 



AU 2001094872 A Based on 



WO 2002027031 



PRIORITY APPLN. INFO: US 2000-236407P 20000928/ US 2001-965876 

20010928 

AB WO 200227031 A UPAB: 20020730 

NOVELTY - Quantifying (Ml) expression of target genes in living cells 
comprising : 

(1) providing cells that possess a target gene of interest which has 
been tagged with the binding site for an RNA binding protein and a 
fluorescent ly labeled RNA binding polypeptide (I) that includes an RNA 
binding domain; and 

(2) calculating the quantity of target gene expression in the cells 
using fluorescence signaling techniques. 

DETAILED DESCRIPTION - Quantifying (Ml) expression of one or more 
target genes in living cells comprising: 

(a) providing cells that possess at least a first f luorescently 
labeled RNA binding polypeptide (I) which comprises first RNA binding 
domain (RBD1) , and at least a first target gene of interest (Tl) that has 
been modified to comprise one or more nucleic acid sequences encoding a 
first binding site (BS1) for RBD1 where, upon expression of (Tl) into 
first target RNA, BS1 is specifically bound by the first f luorescently 
labeled ( I ) ; 

(b) scanning the cells to obtain fluorescent signals from the first 
f luorescently labeled (I); 

(c) determining fluorescent emission intensities from the first 
f luorescently labeled (I) at two different wavelengths; 

(d) calculating a ratio of the fluorescent emission intensities from 
the first f luorescently labeled (I) at the two different wavelengths; and 

(e) calculating a quantity of the first target RNA in the cells from 
the ratio. 

An INDEPENDENT CLAIM is included for a f luorescently labeled (I) 
comprising : 

(a) a non-naturally occurring amino acid sequence comprising: 

(i) a nuclear export signal; and 

(ii) an RNA binding domain; and 

(b) a fluorophore pair such as a donor/acceptor pair for fluorescence 
resonance energy transfer (FRET), an excimer forming fluorophore pair, or 
an exciplex forming fluorophore pair. 

USE - (Ml) is useful for quantifying expression of one or more target 
genes in living cells which comprise two or more distinct populations of 
cells (claimed) . The method is used to quantitate the expression of any 
target gene, including expression of protein-encoding messenger RNA genes, 
ribosomal RAN encoding genes, and transfer RNA encoding genes, so long as 
the RNA expression product from the target gene possesses a sequence or 
structure (the RNA tag) that is bound specifically by the RNA binding 
polypeptide being used. 
Dwg.0/3 
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INVENTOR (S) : 
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PATENT INFORMATION 
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Automated inference creation involves analyzing 
connection network constructed using records from 
inference database, to determine inference regarding 
physico-chemical relation between chemical or biological 
molecules . 
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APPLICATION DETAILS- 



PATENT NO 



KIND 



APPLICATION 



DATE 



WO 2001055950 A2 
AU 2001032928 A 
EP 1252596 A2 



WO 2001-US2245 
AU 2001-32928 
EP 2001-905006 
WO 2001-US2245 



20010124 
20010124 
20010124 
20010124 



FILING DETAILS: 



PATENT NO 



KIND 



PATENT NO 



AU 2001032928 A Based on 
EP 1252596 A2 Based on 



WO 2001055950 
WO 2001055950 



PRIORITY APPLN. INFO: US 2001-769169 20010124; US 2000-177964P 

20000125 

AB WO 200155950 A UPAB: 20021209 

NOVELTY - Co-occurrence count is set to starting values of co-occurring 
preset name and filtered chemical or biological molecule name, when 
filtered name is not stored in inference database (24,26). Co-occurrence 
count is incremented for each pair of preset name, when stored in 
database. Connection network is constructed using records from database 
and analyzed to determine inferences regarding relationships between 
chemical or biological molecules. 

DETAILED DESCRIPTION - INDEPENDENT CLAIMS are also included for the 
following : 

(a) Method for checking automatically created inferences; 

(b) Automated inference system 

USE - For creating automated inferences for physico-chemical 
interactions through co-occurrence analysis of inference databases. 

ADVANTAGE - Allows scientists and researchers to automatically create 
and check inferences of physico-chemical interaction through co-occurrence 
analysis of indexed databases. Facilitates user's understanding of 
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biological functions, such as cell function, to design experiments more 
intelligently and to analyze experimental results more thoroughly. Helps 
drug discovery scientists select better targets for pharmaceutical 
intervention in hope of curing diseases. 

DESCRIPTION OF DRAWING (S) - The figure shows the exemplary 
experimental data storage system for storing experimental data. 

Inference database 24,26 
Dwg. 1/ 4 
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PATENT ASSIGNEE (S) : 
COUNTRY COUNT: 
PATENT INFORMATION: 



WPIDS COPYRIGHT 2003 THOMSON DERWENT on STN 
2001-476263 [51] WPIDS 
2001-496878 [54] 
N2001-352481 
C2001-142902 

Strength measurement of co-occurrence data for automated 
interference of physico-chemical interaction knowledge, 
involves determining if co-occurrence between at least 
two chemical or biological molecule names is non-trivial 
B04 D16 T01 
BUSA, W B 
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PRIORITY APPLN. INFO: US 2001-768686 20010124; US 2000-177964P 

20000125; US 2000-201105P 20000502; US 
2001-769169 20010124 

AB WO 200155951 A UPAB: 20021209 

NOVELTY - A strength of co-occurrence data is measured by extracting at 
least two chemical or biological molecule names from database record; and 
determining likelihood statistic for co-occurrence reflecting 
physico-chemical interactions between the two molecule names, and applying 
it to the co-occurrence to determine if co-occurrence between the molecule 
names is non-trivial „ 

DETAILED DESCRIPTION - Strength measurement of co-occurrence data 
involves extracting at least two chemical or biological molecule names 
from database record from an interference database; determining likelihood 
statistic for co-occurrence reflecting physico-chemical interactions 
between the two molecule names (A and B) ; and applying the likelihood 
statistic to the co-occurrence to determine if the co-occurrence between 
molecule A and molecule B is non-trivial. The interference database 
includes those records created from an indexed literature database. The 
two molecule names co-occur in at least one record in an indexed 
scientific literature database. 

An INDEPENDENT CLAIM is also included for: 

(1) a method of contextual querying of co-occurrence data comprising 
selecting a target node from a first list of nodes connected by arcs in a 
connection network; creating a second list of nodes by considering other 
nodes that are neighbors of the target node and other nodes in prior to 
the target node in the connection network; selecting a next node from the 
second list of nodes using the co-occurrence values, in which the next 
node is next after the target node in the pre-determined order for the 
connection network based on the co-occurrence values; 

(2) method of query polling of co-occurrence data comprising 
selecting a position in connection network for an unknown target node from 
a first list of nodes; determining a second list of nodes prior to the 
position of unknown target node in the connection network; determining a 
third list of nodes subsequent to the position of unknown target node in 
the connection network; determining a fourth list of nodes included in 
both the second and the third lists of nodes; and determining an identity 
for the unknown target node by selecting a node from the fourth list of 
nodes using likelihood statistic; and 

(3) a method for creating automated biological interferences 
comprising constructing a connection network using at least one database 
record from an interference database; applying likelihood statistics 
analysis methods to the connection network; generating automatically at 
least one biological interferences relationships between chemical or 
biological molecules or biological processes using the results from the 
likelihood statistic analysis methods. 

USE - The method is for automated interference of physico-chemical 
interaction knowledge from databases of term co-occurrence data. It can 
also be used to facilitate a user's understanding of biological functions, 
e.g. cell functions, to design experiments, and to analyze experiment 
results . 

ADVANTAGE - The method helps drug discovery scientists select better 
targets for pharmaceutical intervention of curing diseases. It may also 
help facilitate the abstraction of knowledge from information for 
biological experimental data and provides new bioinf ormatic techniques. 
Dwg. 0/9 
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COPYRIGHT (C) 2003 AMERICAN CHEMICAL SOCIETY (ACS) 



Copyright of the articles to which records in this database refer is 
held by the publishers listed in the PUBLISHER (PB) field (available 
for records published or updated in Chemical Abstracts after December 
26, 1996), unless otherwise indicated in the original publications. 
The CA Lexicon is the copyrighted intellectual property of the 
American Chemical Society and is provided to assist you in searching 
databases on STN. Any dissemination, distribution, copying, or storing 
of this information, without the prior written consent of CAS, is 
strictly prohibited. 

FILE COVERS 1907 - 22 Oct 2003 VOL 139 ISS 17 
FILE LAST UPDATED: 21 Oct 2003 (20031021/ED) 

This file contains CAS Registry Numbers for easy and accurate 
substance identification. 
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L6 4 0840 SEA FILE-CAPLUS ABB-ON DATABASE # 
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L37 78557 SEA FILE-WPIDS ABB-ON DATABASE* OR ALGORITHM* 
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L45 


545 


SEA 


FILE' 


=WPIDS 


ABB= 


=0N 


L47 


36 


SEA 


FILE; 


=WPIDS 


ABB= 


=ON 


L4 8 


24 


SEA 


FILE= 


=WPIDS 


ABB= 


=ON 


L4 9 


13 


SEA 


FILE: 


=WPIDS 


ABB= 


=0N 


L50 


4 


SEA 


FILE= 


-WPIDS 


ABB: 


=ON 



STRUCTUR? (2A) ACTIVIT? 

L37 AND L4 5 , . .u 

L47 AND T/DC - r----- ;( v " " 
L4 8 AND BO 4 /DC - r^^At^^Kf'.-.L. 1 ^- 
L49 AND (ACTIVITY' OR DESCRIPTOR* ) /TI 



L37 78557 SEA FILE=WPIDS ABB-ON DATABASE* OR ALGORITHM* 

L40 5986 SEA FILE=WPIDS ABBON (CHEMICAL OR BIOLOGICAL) (3A) (MOLECULE* 

OR STRUCTUR?) 

L45 545 SEA FILE=WPIDS ABBON STRUCTUR? (2A) ACTIVIT? 

L51 369 SEA FILE=WPIDS ABBON CONNECTION NETWORK* 

L52 160517 SEA FILE=WPIDS ABBON NODE* OR ARC* 

L54 3 SEA FILE=WPIDS ABBON (L40 OR L45) AND (L51 OR L52) AND L37 

AND (INTERACTION* OR RELATION?) /TI 



=> s (142 or 144 or 150 or 154) not 13 N ;■. 

L98 9 (L42 OR L44 OR L50 OR L54) NOT (L3 ) ' tVv ' u ^ ^ I 

=> fil PASCAL, JICST-EPLUS, INSPEC, LIFESCI, BIOSIS, ANABSTR, SCISEARCH 

FILE 1 PASCAL 1 ENTERED AT 16:04:06 ON 22 OCT 2003 

Any reproduction or dissemination in part or in full, 

by means of any process and on any support whatsoever 

is prohibited without the prior written agreement of INIST-CNRS. 

COPYRIGHT (C) 2003 INIST-CNRS. All rights reserved. 

FILE 1 JICST-EPLUS 1 ENTERED AT 16:04:06 ON 22 OCT 2003 

COPYRIGHT (C) 2003 Japan Science and Technology Corporation (JST) 

FILE 'INS PEC 1 ENTERED AT 16:04:06 ON 22 OCT 2003 

Compiled and produced by the IEE in association with FIZ KARLSRUHE 
COPYRIGHT 2003 (c) INSTITUTION OF ELECTRICAL ENGINEERS (IEE) 

FILE 'LIFESCI 1 ENTERED AT 16:04:06 ON 22 OCT 2003 
COPYRIGHT (C) 2003 Cambridge Scientific Abstracts (CSA) 



FILE ' BIOSIS 1 ENTERED AT 16:04:06 ON 22 OCT 2003 
COPYRIGHT (C) 2003 BIOLOGICAL ABSTRACTS INC.(R) 

FILE 'ANABSTR 1 ENTERED AT 16:04:06 ON 22 OCT 2003 
COPYRIGHT (c) 2003 THE ROYAL SOCIETY OF CHEMISTRY (RSC) 

FILE 'SCISEARCH 1 ENTERED AT 16:04:06 ON 22 OCT 2003 
COPYRIGHT 2003 THOMSON ISI 



=> d que 171; d que 168 ;d que 175; d que 180 ;d que 187; d que 183; d que 188; d que 190; 
d que 196 

L59 98530 SEA INFERENCE* 

L61 153155 SEA PHYSICOCHEMICAL OR PHYSICO CHEMICAL 

L64 1326996 SEA DATABASE* OR ALGORITHM* 

L67 2346 SEA L59(3A) L64 

L71 0 SEA L67 AND L61 



L57 174626 SEA (MOLECUL? OR STRUCTUR?) (5A) (BIOLOGICAL? OR CHEMICAL?) 

L59 98530 SEA INFERENCE* 
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L64 1326996 SEA DATABASE # OR ALGORITHM* 

L56 142316 SEA ABB—ON STRUCTUR? (3A) ACTIVI 

L57 174626 SEA ABB=ON (MOLECUL? OR STRUCTUi 

CHEMICAL?) 

L58 610997 SEA ABB-ON (MOLECUL? OR STRUCTUR?) K 

0R_ACTIVIT?) 

L60 14988 SEA ' COOCCUR? OR CO OCCUR? 

L61 153155 SEA ABB=ON PHYSICOCHEMICAL OR PHYSICO CHEMICAL 

L62 363 SEA ABBON CONNECTION NETWORK* 

L63 596298 SEA ABB=ON NODE* OR NODAL? OR ARC* 

L64 1326996 SEA ABB-ON DATABASE* OR ALGORITHM* 

L78 16 SEA ((L56 OR L57 OR L58) %R L61) AND (L62 OR L63 OR L64) AND 

L60 

L80 2 SEA L78 AND PROTEIN/TI 



L56 142316 SEA STRUCTUR? ( 3A) ACTIVIT? 

L57 174626 SEA (MOLECUL? OR STRUCTUR?) (5A) (BIOLOGICAL? OR CHEMICAL?) 

L58 610997 SEA (MOLECUL? OR STRUCTUR?) (5A) (PROCESS? OR FUNCTION? OR 

ACTIVIT?) 

L59 98530 SEA INFERENCE* 

L61 153155 SEA PHYSICOCHEMICAL OR PHYSICO CHEMICAL 

L82 11219 SEA L61 AND (L56 OR L57 OR L58) 

L86 12 SEA L82 AND L59 

L87 1 SEA L8 6 AND RE AS ON I N G / T I 



L56 142316 SEA STRUCTUR? ( 3A) ACTIVIT? 

L57 174626 SEA (MOLECUL? OR STRUCTUR?) (5A) (BIOLOGICAL? OR CHEMICAL?) 

L58 610997 SEA (MOLECUL? OR STRUCTUR?) (5A) (PROCESS? OR FUNCTION? OR 

ACTIVIT?) 

L59 98530 SEA INFERENCE* 

L61 153155 SEA PHYSICOCHEMICAL OR PHYSICO CHEMICAL 

L62 363 SEA CONNECTION NETWORK* 

L63 5 962 98 SEA NODE* OR NODAL? OR ARC* 

L64 1326996 SEA DATABASE* OR ALGORITHM* 

L82 11219 SEA L61 AND (L56 OR L57 OR L58) 

L83 1 SEA (L62 OR L63 OR L64) AND L59 AND L82 



L56 142316 SEA STRUCTUR? ( 3A) ACTIVIT? 

L57 174626 SEA (MOLECUL? OR STRUCTUR?) (5A) (BIOLOGICAL? OR CHEMICAL?) 

L58 610997 SEA (MOLECUL? OR STRUCTUR?) (5A) (PROCESS? OR FUNCTION? OR 

ACTIVIT?) 

L61 153155 SEA PHYSICOCHEMICAL OR PHYSICO CHEMICAL 

L62 363 SEA CONNECTION NETWORK* 

L63 596298 SEA NODE* OR NODAL? OR ARC* 

L64 1326996 SEA DATABASE* OR ALGORITHM* 

L82 11219 SEA L61 AND (L56 OR L57 OR L58) 

L88 1 SEA L82 AND L64 AND (L62 OR L63) 



L56 142316 SEA STRUCTUR? ( 3A) ACTIVIT? 

L57 174 626 SEA (MOLECUL? OR STRUCTUR?) (5A) (BIOLOGICAL? OR CHEMICAL?) 

L58 610997 SEA (MOLECUL? OR STRUCTUR?) (5A) (PROCESS? OR FUNCTION? OR 

ACTIVIT?) 



Searched by Barb O'Bryen, STIC 308-4291 



Zhou 09/768686 



L59 


98530 


SEA 


INFERENCE* 


L60 


14988 


SEA 


COOCCUR? OR CO OCCUR? 


L61 


153155 


SEA 


PHYSICOCHEMICAL OR PHYSICO CHEMICAL 


L64 


1326996 


SEA 


DATABASE # OR ALGORITHM* 


L82 


11219 


SEA 


L61 AND (L56 OR L57 OR L58) 


L8 9 


300 

*S \*f Vv 


SEA 


L82 AND L64 


L90 


1 


SEA 


(L59 OR L60) AND L89 


L5 6 


142316 


SEA 


STRUCTUR? (3A) ACTIVIT? 


L57 


174626 


SEA 


(MOLECUL? OR STRUCTUR?) (5A) (BIOLOGICAL? OR CHEMICAL?) 


L58 


610997 


SEA 


(MOLECUL? OR STRUCTUR?) (5A) (PROCESS? OR FUNCTION? OR 






ACTIVIT?) 


L61 


153155 


SEA 


PHYSICOCHEMICAL OR PHYSICO CHEMICAL 


L64 


1326996 


SEA 


DATABASE* OR ALGORITHM* 


L82 


11219 


SEA 


L61 AND (L56 OR L57 OR L58) 


L8 9 


300 


SEA 


L82 AND L64 


L91 


242 


SEA 


L8 9 AND (CHEMICAL OR CHEMISTRY OR BIOLOG?) 


L96 


29 


SEA 


L91 AND (RELATIONAL? OR NON SEQUENCE OR PRO OR BANK OR 



FOLD OR PHYSEAN OR DESCRIPTOR* ) /TI 



=> s 168 or 175 or 180 or 187 or 183 or 188 or 190 or 196 

L99 36 L68 OR L75 OR L80 OR L87 OR L83 OR L88 OR L90 OR L96 

=> dup rem 197,199,198 

FILE 'CAPLUS' ENTERED AT 16:04:25 ON 22 OCT 2003 

USE IS SUBJECT TO THE TERMS OF YOUR STN CUSTOMER AGREEMENT . 

PLEASE SEE "HELP USAGETERMS" FOR DETAILS. 

COPYRIGHT (C) 2003 AMERICAN CHEMICAL SOCIETY (ACS) 

FILE ' PASCAL T ENTERED AT 16:04:25 ON 22 OCT 2003 

Any reproduction or dissemination in part or in full, 

by means of any process and on any support whatsoever 

is prohibited without the prior written agreement of INIST-CNRS. 

COPYRIGHT (C) 2003 INIST-CNRS. All rights reserved. 

FILE 1 INSPEC ENTERED AT 16:04:25 ON 22 OCT 2003 

Compiled and produced by the IEE in association with FIZ KARLSRUHE 
COPYRIGHT 2003 (c) INSTITUTION OF ELECTRICAL ENGINEERS (IEE) 

FILE 'LIFESCI' ENTERED AT 16:04:25 ON 22 OCT 2003 
COPYRIGHT (C) 2003 Cambridge Scientific Abstracts (CSA) 



FILE 'BIOSIS' ENTERED AT 16:04:25 ON 22 OCT 2003 
COPYRIGHT (C) 2003 BIOLOGICAL ABSTRACTS INC. (R) 

FILE 'SCISEARCH' ENTERED AT 16:04:25 ON 22 OCT 2003 
COPYRIGHT 2003 THOMSON ISI 

FILE 'WPIDS' ENTERED AT 16:04:25 ON 22 OCT 2003 
COPYRIGHT (C) 2003 THOMSON DERWENT 
PROCESSING COMPLETED FOR L97 
PROCESSING COMPLETED FOR L99 
PROCESSING COMPLETED FOR L98 

L100 45 DUP REM L97 L99 L98 (9 DUPLICATES REMOVED) 

ANSWERS '1-9' FROM FILE CAPLUS 
ANSWERS '10-16' FROM FILE PASCAL 
ANSWERS ' 17-20 ' FROM FILE INSPEC 
ANSWERS 1 21-22 1 FROM FILE LIFESCI 
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ANSWERS ' 23-25 1 FROM FILE BIOSIS 
ANSWERS ! 26-36 1 FROM FILE SCISEARCH 
ANSWERS ' 37-4 5 ! FROM FILE WPIDS 



: > d ibib ab 1-45; fil horn 



L100 ANSWER 1 OF 45 
ACCESSION NUMBER: 
DOCUMENT NUMBER: 
TITLE : 



AUTHOR (S) : 

CORPORATE SOURCE 
SOURCE: 

PUBLISHER: 
DOCUMENT TYPE: 
LANGUAGE : 
AB 



CAPLUS COPYRIGHT 2003 ACS on STN 
2002:329153 CAPLUS 
137 : 59817 

Structural similarity to link sequence space: new 
potential super families and implications for 
structural genomics 

Aloy, Patrick; Oliva, Baldomero; Querol, Enrique; 
Aviles, Francesc X.; Russell, Robert B. 
EMBL, Heidelberg, D-69117, Germany 
Protein Science (2002), 11(5), 1101-1116 
CODEN: PRCIEI; ISSN: 0961-8368 
Cold Spring Harbor Laboratory Press 
Journal 
English 

The current pace of structural biol. now means that 
protein three-dimensional structure can be known before protein 
function, making methods for assigning homol . via structure 
comparison of growing importance. Previous research has suggested that 
sequence similarity after structure-based alignment is one of the best 
discriminators of homol. and often functional similarity. Here, 
we exploit this observation, together with a merger of protein structure 
and sequence databases, to predict distant homologous 

relationships. We use the Structural Classification of Proteins (SCOP) 
database to link sequence alignments from the SMART and Pfam 
databases . We thus provide new alignments that could not be 
constructed easily in the absence of known three-dimensional structures. 
We then extend the method of Murzin (1993b) to assign statistical 
significance to sequence identities found after structural alignment and 
thus suggest the best link between diverse sequence families. We find 
that several distantly related protein sequence families can be linked 
with confidence, showing the approach to be a means for inferring 
homologous relationships and thus possible functions when 
proteins are of known structure but of unknown function. The 
anal, also finds several new potential superf amilies, where inspection of 
the assocd. alignments and superimpositions reveals conservation of 
unusual structural features or co-location of conserved amino acids and 
bound substrates. We discuss implications for Structural Genomics 
initiatives and for improvements to sequence comparison methods. 
REFERENCE COUNT: 58 THERE ARE 58 CITED REFERENCES AVAILABLE FOR THIS 

RECORD. ALL CITATIONS AVAILABLE IN THE RE FORMAT 



L100 ANSWER 2 OF 45 
ACCESSION NUMBER: 
DOCUMENT NUMBER: 
TITLE: 

AUTHOR (S) : 
CORPORATE SOURCE: 



SOURCE : 



DOCUMENT TYPE: 



CAPLUS COPYRIGHT 2003 ACS on STN 
2001:477182 CAPLUS 
136:161855 

A model for phylogenetic inference using 
structural and chemical covariates 
Tavare, Simon; Adams, Dean C; Fedrigo, Olivier; 
Naylor, Gavin J. P. 

Departments of Biological Sciences, Mathematics and 
Preventative Medicine, University of Southern 
California, Los Angeles, CA, 90089, USA 
Pacific Symposium on Biocomputing 2001, Mauna Lani, 
HI, United States, Jan. 3-7, 2001 (2001), 215-225. 
Editor(s): Alt man, Russ B. World Scientific 
Publishing Co. Pte. Ltd.: Singapore, Singapore. 
CODEN: 69BLFC 
Conference 
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LANGUAGE: English 

AB We investigated whether or not evolutionary change in DNA sequence data 

was homogeneous across different classes of base pairs. DNA sequences for 
eight protein-coding mitochondrial genes were obtained for 38 vertebrate 
taxa from GenBank, Each nucleotide site in the alignment was classified 
according to a no. of co'variates, including its codon position, genetic 
code degeneracy, and hydrophobicity . The evolutionary transition matrix 
for each base was estd. by tracing implied character changes under 
parsimony on a known phylogenetic tree. Canonical variates analyses of 
the inferred transition matrixes were performed for each gene to 
det. whether or not different classes of bases behaved similarly. We 
found five distinct clusters of transition matrixes that could be roughly 
defined by combinations of codon position and degeneracy. This pattern 

wa s consiste nt am ong all genes . A -.stochastic model of rate variation 

based on the interaction of the covariates was developed to assess the 
statistical significance of the clusters. The five-group 

classification was found to explain significantly more sequence variation 
than did a codon only classification, a codon degeneracy classification, 
or a codon and degeneracy classification. The same five-group 
classification was found for all genes tested, suggesting a common 
process underlying the mol . evolution of the mitochondrial genome. 
These results confirm that there are classes of base pairs that evolve 
differently, and suggest that models of sequence evolution that 
incorporate covariate information may be useful in developing nucleotide 
substitution models that more accurately reflect evolutionary history. 
REFERENCE COUNT: 24 THERE ARE 24 CITED REFERENCES AVAILABLE FOR THIS 

RECORD. ALL CITATIONS AVAILABLE IN THE RE FORMAT 



L100 ANSWER 3 OF 45 CAPLUS COPYRIGHT 2003 ACS on STN 
ACCESSION NUMBER: 2000:885510 CAPLUS 

DOCUMENT NUMBER: 135:205926 

TITLE: Genetic network inference: from 

co-expression clustering to reverse engineering 

AUTHOR (S) : D'haeseleer, Patrik; Liang, Shoudan; Somogyi, Roland 

CORPORATE SOURCE: Department of Computer Science, University of New 

Mexico, Albuquerque, NM, 87131, USA 

SOURCE: Bioinformatics (2000), 16(8), 707-726 

CODEN: BOINFP; ISSN: 1367-4803 

PUBLISHER: Oxford University Press 

DOCUMENT TYPE: Journal; General Review 

LANGUAGE: English 

AB A review with 103 refs. Motivation: Advances in mol. biol., anal, and 

computational technologies are enabling us to systematically investigate 
the complex mol. processes underlying biol. 

systems. In particular, using high-throughput gene expression assays, we 
are able to measure the output of the gene regulatory network. We aim 
here to review datamining and modeling approaches for conceptualizing and 
unraveling the functional relationships implicit in these datasets. 
Clustering of co-expression profiles allows us to infer shared 
regulatory inputs and functional pathways. We discuss various aspects of 
clustering, ranging from distances measures to clustering 
algorithms and multiple-cluster memberships. More advanced anal, 
aims to infer causal connections between genes directly, i.e. 
who is regulating whom and how. We discuss several approaches to the 
problem of reverse engineering of genetic networks, from discrete Boolean 
networks, to continuous linear and non-linear models. We conclude that 
the combination of predictive modeling with systematic exptl. verification 
will be required to gain a deeper insight into living organisms, 
therapeutic targeting and bioengineering . 
REFERENCE COUNT: 103 THERE ARE 103 CITED REFERENCES AVAILABLE FOR 

THIS RECORD. ALL CITATIONS AVAILABLE IN THE RE 

FORMAT 
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L100 ANSWER 4 OF 45 
ACCESSION NUMBER: 
DOCUMENT NUMBER: 
TITLE: 

AUTHOR (S) : 
CORPORATE SOURCE: 



SOURCE : 

PUBLISHER: 
DOCUMENT TYPE: 
LANGUAGE : 



CAPLUS COPYRIGHT 2003 ACS on STN 
2000: 841321 CAPLUS 
134 : 112480 

13C NMR chemical shifts can predict disulfide bond 
formation 

Sharma, Deepak; Rajarathnam, Krishna. 

Department of Human Biological Chemistry and Genetics 
and Sealy Center for Structural Biology, University of 
Texas Medical Branch, Galveston, TX, 77555-1055, USA 
Journal of Biomolecular NMR (2000), 18(2), 165-171 
CODEN: JBNME9; ISSN: 0925-2738 
Kluwer Academic Publishers 
Journal 
. English 



AB The presence of "disulfide bonds can be detected unambiguously only by 
x-ray crystallog., and otherwise must be inferred by chem. 
methods. In this study we demonstrate that 13C NMR chem. shifts are 
diagnostic of disulfide bond formation, and can discriminate between 
cysteine in the reduced (free) and oxidised (disulfide bonded) state. A 
database of cysteine 13CC. alpha, and C.beta. chem. shifts was 
constructed from the BioMagResBank (BMRB) and Sheffield databases 
, and published journals. Statistical anal, indicated that the 
C.beta. shift is extremely sensitive to the redox state, and can predict 
the disulf ide-bonded state. Further, chem. shifts in both states occupy 
distinct clusters as a function of secondary structure in the 
C . alpha * /C .beta . chem, shift map. On the basis of these results, we 
provide simple ground rules for predicting the redox state of cysteines; 
these rules could be used effectively in NMR structure detn., predicting 
new folds, and in protein folding studies. 



REFERENCE COUNT: 



8 



THERE ARE 8 CITED REFERENCES AVAILABLE FOR THIS 
RECORD . ALL CITATIONS AVAILABLE IN THE RE FORMAT 



L100 ANSWER 5 OF 45 
ACCESSION NUMBER: 
DOCUMENT NUMBER: 
TITLE: 



AUTHOR (S) 
CORPORATE 

SOURCE : 



SOURCE 



CAPLUS COPYRIGHT 2003 ACS on STN 
1996:151829 CAPLUS 
124:242855 

A comparison of some commercially available structural 
descriptors and clustering algorithms 
Brown, Robert D.; Bures, Mark G.; Martin, Yvonne C. 
Pharm. Prod. Div., Abbott Lab., Abbott Part, IL, 
60064, USA 

Proceedings of the First Electronic Computational 
Chemistry Conference [CD-ROM] (1995), Meeting Date 
1994, Paper 12. Editor(s): Bachrach, Steven M. 
ARInternet Corp.: Landover, Md. 
CODEN: 62MDAN 
Conference 
English 

Clustering methods play an important part in the selection of compds . from 
chem. databases for both purchase and biol . screening. These 
clustering methods usually rely on descriptors which encode the structural 
features of the mols. in the databases. Structural descriptors 
allow the similarities of pairs of mols. to be calcd. from the co 
-occurrence of these features. Clusters may then be assembled 
on the basis of the similarity measures. A no. of methods exist within 
com. available database searching software to produce these 
descriptors. In this paper the relative merits of some of these 
descriptors, which variously describe the two-dimensional and 
three-dimensional content of mols., are examd. Two com. available 
clustering algorithms are also compared, one hierarchical and 
one non-hierarchical. All comparisons are based on the ability of the 
methods to produce sets of clusters in which biol. active and inactive 
structures do not occur in the same clusters. The various descriptors of 
two-dimensional structure perform better in this respect, particularly 



DOCUMENT TYPE: 
LANGUAGE : 
AB 
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when used in combination with the hierarchical clustering method. 



LI 00 ANSWER 6 OF 4 5 
ACCESSION NUMBER: 
DOCUMENT NUMBER: 
TITLE : 

AUTHOR (S) : 
CORPORATE SOURCE: 

SOURCE: 



CAPLUS COPYRIGHT 2003 ACS on STN 
1993:670093 CAPLUS 
119:270093 

Similarity criteria for chemical 

structures and reactions 

Gasteiger, Johann; Ihlenfeldt, Wolf D. 

Inst. Org. Chem., Tech. Univ. Munich, Garching, 

W-8046, Germany 

Chem. Struct. 2 Proc. Int. Conf., 2nd (1993), Meeting 
Date 1990, 423-38. Editor(s): Warr, Wendy A. 
Springer : Berlin, Germany. 
CODEN: 59IUAO 



DOCUMENT TYPE: 
LANGUAGE : 
AB 



Conference ~ 
English 

New definitions of similarity of chem. structures are 
presented that are based on finding building blocks for synthesis and on 
general types of reactions. The merits of these similarity criteria in 
analyzing a database of structures and in designing org. 

syntheses are illustrated. Reaction similarities are based on values for 
electronic and energy effects. They allow novel search strategies for 
reaction databases and inferences on reaction 
conditions . 



L100 ANSWER 7 OF 45 
ACCESSION NUMBER: 
DOCUMENT NUMBER : 
TITLE: 



AUTHOR (S) : 
CORPORATE SOURCE 

SOURCE : 



DOCUMENT TYPE: 
LANGUAGE : 
AB 



CAPLUS COPYRIGHT 2003 ACS on STN 
1991 : 631325 CAPLUS 
115:231325 

A combined model of multi-resonance 

subspectra/substructure and DARC topological structure 
representation. Local and global knowledge in the 
carbon- 13 NMR DARC database 
Carabedian, Michel; Dubois, Jacques Emile 
Inst. Topol. Dyn. Syst., Univ. Paris 7, Paris, 75005, 
Fr . 

Journal of Chemical Information and Computer Sciences 
(1991), 31(4), 564-74 
CODEN: JCISD8; ISSN: 0095-2338 
Journal 
English 

The structural and spectral information in a 13C NMR database 
can be represented by means of a model which relates substructural 
fragments to subspectral features for multiple resonances. The 
substructural part of this model contains a concise DARC description of 
the structural part with a partially generic ELCOb which is assocd. with 
all the spectral information pertaining to the focal atom (Fo) and its 
neighboring carbons (Ai) . In the spectral information, the concentric 
environmental view is shifted from the focal atom to the neighboring 
positions. This leads to overlap in the views and redundancy in the, 
information and a dissym. phys . perception which formally, is broader than 
the substructural view. New substructural subspectral local and global 
knowledge functions of this model are managed with holog. 
techniques. Formalized local and global knowledge is described 
statistically by juxtaposition of the .delta. 13CFo .times. 
. delta. 13CAi correlation plane supporting the 3-dimensional occurrence 
distributions. Use of the inferential ability of these planes 
is facilitated by a table which correlates the repartitioning of the 
.sigma.- and .pi. -bonds in Fo-Ai atom pairs. 
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1989: 624827 CAPLUS 
111:224827 

Searching for pharmacophores in large coordinate data 
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AUTHOR (S) : 



CORPORATE SOURCE: 



SOURCE: 



DOCUMENT TYPE: 
LANGUAGE : 
AB 



bases and its use in drug design 

Sheridan, Robert P.; Rusinko, Andrew, III; Nilakantan, 
Ramaswamy; Venkataraghavan, R. 

Med. Res. Div., American Cyanamid, Pearl River, NY, 
10965, USA 

Proceedings of the National Academy of Sciences of the 
United States of America (1989), 86(20), 8165-9 
CODEN: PNASA6; ISSN : 0027-8424 
Journal 
English 

Pharmacophores, 3-dimensional arrangements of chem. groups essential for 
biol. activity, are being proposed in increasing nos . The authors 
developed a system to search data bases of 3-dimensional coordinates for 
compds . that contain a particular pharmacophore. The coordinates can be 
derived from expt . (e.gT, Cambridge Crystal Database) or be 
generated from data bases of connection tables (e.g., Cyanamid Labs, 
proprietary compds.) via the program CONCORD. The authors discuss the 
results of searches for 3 sample pharmacophores. Two have been proposed 
by others based on the conformational anal, of active compds., and one is 
inferred from the crystal structure of a protein-ligand complex. 
These examples show that such searches can identify classes of compds. 
that are structurally different from the compds. from which the 
pharmacophore was derived but are known to have the appropriate biol. 
activity. Occasionally, the searches find bond "frameworks" in which the 
important groups are rigidly held in the proper geometry. These may 
suggest new structural classes for synthesis. 



L100 ANSWER 9 OF 45 
ACCESSION NUMBER: 
DOCUMENT NUMBER: 
TITLE: 



CAPLUS COPYRIGHT 2003 ACS on STN 
1985:470570 CAPLUS 
103:70570 

DARC system for documentation and artificial 
intelligence in chemistry 
Dubois, Jacques Emile; Sobel, Yves 
Assoc. Rech, Dev. Inf. Chim. , Paris, 75005, Er. 
Journal of Chemical Information and Computer Sciences 
(1985), 25(3), 326-33 
CODEN: JCISD8; ISSN: 0095-2338 
Journal 
English 

The DARC system for documentation and artificial intelligence involving 
chem. structural information is described and its topol . 
concepts are discussed with respect to interactive data processing 
systems. Operational realigations of the DARC system are described, 
including the knowledge database, functions of inference 

engines, and interface with users. Computer-aided design applications to 
the database are detailed in synthesis design, and structure elucidation. 



AUTHOR (S) : 
CORPORATE SOURCE 
SOURCE: 



DOCUMENT TYPE: 
LANGUAGE : 
AB 
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on STN 
ACCESSION NUMBER: 
COPYRIGHT NOTICE: 

TITLE (IN ENGLISH) : 

AUTHOR: 

CORPORATE SOURCE: 



SOURCE : 



PASCAL COPYRIGHT 2003 INIST-CNRS. ALL RIGHTS RESERVED . 

DUPLICATE 1 

2002-047 6596 PASCAL 

Copyright .COPYRGT. 2002 INIST-CNRS. All rights 
reserved . 

Molecular descriptors that influence the 
amount of drugs transfer into human breast milk 
AGATONOVIC-KUSTRIN S.; LING L. H.; THAM S. Y.; ALANY 
R. G. 

School of Pharmaceutical, Molecular and Biomedical 
Science, University of South Australia, North Terrace, 
Adelaide 5000, Australia; School of Pharmaceutical 
Sciences, Universiti Sains, Penang 11800, Malaysia; 
Division of Pharmacy, The University of Auckland, 
Auckland, New Zealand 
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Most drugs are excreted into breast milk to some extent and are 
bioavailable to the infant. The ability to predict the approximate amount 
of drug that might be present in milk from the drug structure would be 
very useful in the clinical setting. The aim of this research was to 
simplify and upgrade the previously developed model for prediction of the 
milk to plasma (M/P) concentration ratio, given only the molecular 
structure of the drug. The set of 123 drug compounds, with experimentally 
derivedM/P values taken from the literature, was used to develop, test 
and validate a predictive model. Each compound was encoded with 71 
calculated molecular structure descriptors, including constitutional 
descriptors, topological descriptors, molecular connectivity, 
geometrical descriptors, guantum chemical descriptors, 
physicochemical descriptors and liquid properties. Genetic 
algorithm was used to select a subset of the descriptors that 
best describe the drug transfer into breast milk and artificial neural 
network (ANN) to correlate selected descriptors with the M/P ratio and 
develop a QSAR. The averaged literature M/P values were used as the ANN ' s 
output and calculated molecular descriptors as the inputs. A 
nine-descriptor nonlinear computational neural network model has been 
developed for the estimation of M/P ratio values for a data set of 123 
drugs . The model included the percent of oxygen, parachor, density, 
highest occupied molecular orbital energy (HOMO), topological indices 
(.sub.XV2, .sub.X2 and .sub. XI) and shape indices (K3, .kappa. 2), as the 
inputs had four hidden neurons and one output neuron. The QSPR that was 
developed indicates that molecular size (parachor, density) shape 
(topological shape indices, molecular connectivity indices) and 
electronic properties (HOMO) are the most important for drug transfer 
into breast milk. Unlike previously reported models, the QSPR model 
described here does not require experimentally derived parameters and 
could potentially provide a useful prediction of M/P ratio of new drugs 
only from a sketch of their structure and this approach might also be 
useful for drug information service. Regardless of the model or method 
used to estimate drug transfer into breast milk, these predictions should 
only be used to assist in the evaluation of risk, in conjunction with 
assessment of the infant's response. 
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AB For new chemical substances that are notified within the 

European Union, data sets have to be submitted to the National Competent 
Authorities. The data submitted have to demonstrate the 
physicochemical and toxic properties of the new chemical 
, such as solubility, partition coefficients and spectra, as well as 
acute toxic properties and the potential to cause local irritant or 
corrosive effects In order to minimise testing for notification purposes 
(for example, animal testing), it is necessary to develop stepwise 
assessment procedures, including structure-activity 

considerations, alternative methods (for example, in vitro tests), and 

computerised structure-activity relationship (SAR) 

models. An electronic database was developed which contains 

physicochemical and toxicological data on . approximately 1300 

chemical substances. It is used for regulatory structure-property 

relationship (SPR) and SAR considerations, and for the development of 

rules for a decision support system (DSS) for the introduction of 

alternative- methods into local irritancy/corrosivity testing strategies. 

The information stored in the database is derived from 

proprietary data, so it is not possible to publish the data directly. 

Therefore, the database is evaluated by regulators, and the 

information derived from the data is used for the development of 

scientific information about SARs . This information can be published, for 

example, by means of tables correlating measured physicochemical 

values and specific toxic effects caused by the measured chemical 

, This information is introduced to the public by means of a DSS that 

predicts local irritant /corrosive potential of a chemical by 

listing so-called exception rules of the kind IF (physicochemical 

property) A THEN not (toxic) Effect B and so-called structural rules of 

the kind IF Substructure A THEN Effect B. These DSS rules "translate" 

proprietary data into scientific knowledge that can be published. 
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AB Inference process plays an important role in the realisation of expert 

systems. In this paper it is shown that chemical reactions may by used to 
perform molecular inference according to the algorithm 
of forward chaining. This method is accomplished by an adequate 
interpretation of inorganic chemical compounds and unidirectional 
reactions. In our approach premise clauses are represented by the 
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reactants while conclusion clauses are represented by the products of 
reaction. Different inorganic compounds and reactions have been discussed 
with respect to their utility for the molecular inference. Special 
attention is focused on qualitative chemistry and a number of reactions 
has been taken into account. Experimental results demonstrating 
application of these reactions in expert systems are provided. 
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AB The efficiency of the drug discovery process can be significantly 

improved using design techniques to maximize the diversity of structure 

databases or combinatorial libraries. Here, several 

physicochemical descriptors were investigated to quantify 

molecular diversity. Based on the 2D or 3D topological similarity of 

molecules, the relationship between physicochemical metrics and 

biological activity was studied to find valid descriptors. 

Several compounds were selected using those descriptors from a 

database containing diverse templates and 55 biological 

classes. It was evaluated whether the obtained subsets represent all 

biological properties and structural variations of the 

original database. In addition, hierarchical cluster analyses 

were used to group molecules from the parent database, which 

should have similar biological properties. Using various sets 

of structurally similar molecules, it was possible to derive 

quantitative measures for compound similarities in relation to 

biological properties. A similarity radius for 2D fingerprints 

and molecular steric fields was estimated; compounds within this radius 

of another molecule were shown to have comparable 

biological properties. This study demonstrates that 2D 

fingerprints alone or in combination with other metrics as the primary 
descriptor allow to handle global diversity. In addition, standard 
atom-pair descriptors or molecular steric fields can be used to correlate 
structural diversity with biological activity 

• Hence, the latter two descriptors can be classified as secondary 

descriptors useful for analog library design, while 2D fingerprints are 

applicable to design a general library for lead discovery. Based on these 

findings, an optimally diverse subset containing only 38% of the entire 

IC93 database was generated using 2D fingerprints. Here no 

structure is more similar than 0.85 to any other (Tanimoto coefficient), 

but all biological classes were selected. This reduction of 

redundancy led to a child database with the same 

physicochemical diversity space, which contains the same 

information as the original database. 
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In this paper we introduce a computer algorithm and program 
Pro-Anal for analysis of the structure-activity 

relationship in a family of. evolutionarily related (and/or artificially 
mutated) proteins /peptides . The program uses aligned amino acid sequences 
with data of their activity (pK, K.sub.m, ED . sub . 5 . sub . 0 or any other) 
and searches for correlations between data on activity and various 
physico-chemical characteristics of different regions 
in primary structures. In automatic mode, the program generates and 
verifies hypotheses on the disposition of a sequential modulating region' 
in a protein, and key characteristics of the region. In manual mode, 
users can generate and analyze their own hypotheses. The program is 
implemented on IBM PC or compatible computers. It is designed to be 
easily handled by the occasional computer user and yet it is powerful 
enough for experienced professionals 
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A quantitative structure human intestinal absorption relationship was 
developed using artificial neural network (ANN) modeling. A set of 86 
drug compounds and their experimentally-derived intestinal absorption 
values used in this study was gathered from the literature and a total of 
57 global molecular descriptors, including constitutional, 
topological, chemical, geometrical and quantum chemical 
descriptors, calculated for each compound. A supervised network with 
radial basis transfer function was used to correlate calculated 
molecular descriptors with experimentally-derived measures of 
human intestinal absorption. A genetic algorithm was then used 
to select important molecular descriptors. Intestinal absorption values 
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(1A%) were used as the ANN ' s output and calculated molecular descriptors 
as the inputs. The best genetic neural network (GNN) model with 15 input 
descriptors was chosen, and the significance of the selected descriptors 
for intestinal absorption examined. Results obtained with the model that 
was developed indicate that lipophilicity, conformational stability and 
inter-molecular interactions (polarity, and hydrogen bonding) have the 
largest impact on intestinal absorption. 
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Identifying homologues, defined as genes that arose from a common 
evolutionary ancestor, is often a relatively straightforward task, thanks 
to recent advances made in estimating the statistical significance of 
sequence similarities found from database searches. The extent 
by which homologues possess similarities in function, however, is less 
amenable to statistical analysis. Consequently, predicting function by 
homology is a qualitative, rather than quantitative, process and requires 
particular care to be taken. This review focuses on the various 
approaches that have been developed to predict function from the scale of 
the atom to that of the organism. Similarities in homologues 1 functions 
differ considerably at each of these different scales and also vary for 
different domain families. It is argued that due attention should be paid 
to all available clues to function, including orthologue identification, 
conservation of particular residue types, and the co- 
occurrence of domains in proteins. Pitfalls in database 
searching methods arising from amino acid compositional bias and 
database size effects are also discussed. 
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AB Inference process plays an important role in the realisation of expert 

systems. In this paper it is shown that chemical reactions may by used to 
perform molecular inference according to the algorithm 
of forward chaining. This method is accomplished by an adequate 
interpretation of inorganic chemical compounds and unidirectional 
reactions. In our approach premise clauses are represented by the 
react ants while conclusion clauses are represented by the products of 
reaction . Different inorganic compounds and reactions have been discussed 
with respect to their utility for the molecular inference. Special 
attention is focused on qualitative chemistry and a number of reactions 
has been taken into account . Experimental results demonstrating 
application of these reactions in expert systems are provided. 
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The capabilities and advantages of an integrated system for predicting the 
properties of chemical compounds constructed in simultaneous application 
of the structural and physicochemical properties of compounds 
are analyzed. The construction of such a system for prediction of 
counter-productive properties of chemical compounds is considered on the 
basis of a single numerical characteristic-activation energy-taking into 
account the structural formulas of molecules and of derivatives of 
molecules. One possible mathematical model for combined application of 
different parameters within the framework of a plausible inference 
system of certain properties under investigation is described; a procedure 
is presented for application of such a combined model in a JSM-system for 
automatic generation of hypotheses and questions, whether the questions 
have already been solved through use of the JSM-system for the particular 
model as well as questions that remain to be solved. 
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AB The paper describes the purpose and structure of an automated dataware 
system for gas dynamics with recommendations including reliability 
estimates (ADGDRE) . The system consists of a data bank, a generator of 
simulation of the medium, a library of program modules, and a constructor 
of program modules. The physicochemical data bank consists of 
four bases, namely initial information, data preparation, recommended 
data, and model bases. The content of the base of recommended data is 
described using an example of a base of data on chemical 
reaction rate constants for molecules consisting of nitrogen and 
oxygen atoms. This database includes all the reactions between 
these molecules that are mentioned in the literature. A detailed 
discussion is devoted to the recommended data on dissociation and 
recombination reactions o,f diatomic molecules N2, 02. 
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AB The analysis of QSARs in toxicology makes use of structural features and 
physicochemical parameters and is aimed at several marks such as 
the prediction of toxicity, the preliminary assessment of risk, or the 
validation of alternatives to animal experiments, etc. These various tasks 
make different requirements for the statistical models for data analysis 
as well as for the techniques to extract problem-specific data from the 
databases. Consequently, the evaluative routines should be adapted 
to the database. The structure and the contents of the 
biological database are outlined and the lateral 
communication with a spectral database is indicated. A flexible 
management of the database and the extraction of information 
from it require further utilities that are afforded by APL2-mediated 
extensions of the system. The package TRAINS permits the user-friendly and 
time-saving application of the complicated structure. Based on these 
features, the essentials of a flexible system for the data evaluation and 
its realization are described. The particulars of the arising numerical 
problems and their solution with the aid of APL2 are extensively treated. 
The article concludes with the enumeration of further objectives to be 
achieved . 
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AB We investigate the co-occurrence of domain families in 

eukaryotic proteins to predict protein cellular localization. 
Approximately half (300) of SMART domains form a "small-world network", 
linked by no more than seven degrees of separation. Projection of the 
domains onto two-dimensional space reveals three clusters that correspond 
to cellular compartments containing secreted, cytoplasmic, and nuclear 
proteins. The projection method takes into account the existence of 
"bridging" domains, that is, instances where two domains might not occur 
with each other but frequently co-occur with a third 

domain; in such circumstances the domains are neighbors in the projection. 
While the majority of domains are specific to a compartment ("locale"), 
and hence may be used to localize any protein that contains such a domain, 
a small subset of domains either are present in multiple locales or occur 
in transmembrane proteins. Comparison with previously annotated proteins 
shows that SMART domain data used with this approach can predict, with 92% 
accuracy, the localizations of 23% of eukaryotic proteins. The coverage 
and accuracy will increase with improvements in domain database 
coverage. This method is complementary to approaches that use amino-acid 
composition or identify sorting sequences; these methods may be combined 
to further enhance prediction accuracy. 
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AB Here we propose an approach for predicting the activity of functional DNA 
and RNA sites. This approach includes (1) identification of 
context-dependent conformational, physicochemical, and 
statistical properties of sites significant for their functioning; (2) 
development of a model on their basis for predicting site activity from 
its sequence; and (3) automatic generation of programs for predicting site 
activity based on these models. This approach has been realized as a 
computer system ACTIVITY, which includes databases of site 
activity as well as conformational, physicochemical, and 

statistical properties of DNA and RNA. ACTIVITY is accessible via Internet 
(http://www.bionet.nsc.ru/SRCG/Activity/) and allows real-time analysis of 
experimental data on functional site activity. We analyzed 70 samples of 
sites involved in various molecular biological 
processes and revealed statistical, conformational, and 
physicochemical properties significant for activity of these 
sites. We also developed methods for predicting site activity from their 
nucleotide sequences . 
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AB A database of functional sites for proteins with known 

structures, SITE, is constructed and used in conjunction with a 
simple pattern matching program SiteMatch to evaluate possible function 
conservation in a recently constructed database of fold 
predictions for Escherichia coli proteins (Rychlewski L et al . , 1999, 
Protein Sci 8:614-624). In this and other prediction databases, 
fold predictions are based on algorithms that can recognize weak 
sequence similarities and putatively assign new proteins into already 
characterized protein families. It is not clear whether such sequence 
similarities arise from distant homologies or general similarity of 
physicochemical features along the sequence. Leaving aside the 
important question of nature of. relations within fold superf amilies , it 
possible to assess possible function conservation by looking at the 
pattern of conservation of crucial functional residues. SITE consists of 
a multilevel function description based on structure 

annotations and structure analyses. In particular, active site residues, 
ligand binding residues, and patterns of hydrophobic residues, on the 
protein surface are used to describe different functional features. 
SiteMatch, a simple pattern matching program, is designed to check the 
conservation of residues involved in protein activity in alignments 
generated by any alignment method. Here, this procedure is used to study 
conservation of functional features in alignments between protein 
sequences from the E. coli genome and their optimal structural templates. 
The optimal templates were identified and alignments taken from the 
database of genomic structural predictions was described in a 
previous publication (Rychlewski L et al . , 1999, Protein Sci 8:614-624). 
An automated assessment of function conservation is used to analyze the 
relation between fold and function similarity for a large number of fold 
predictions. For instance, it is shown that identifying low significance 
predictions with a high level of functional residue conservations can be 
used to extent the prediction sensitivity for fold prediction methods. 
Over 100 new fold/function predictions in this class were obtained in the 
E. coli genome. At the same time, about 30% of our previous fold 
predictions are not confirmed as function predictions, further 
highlighting the problem of function divergence in fold superf amilies . 
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No. 2, pp. 115-122. 
CODEN: COABER. ISSN: 0266-7061. 
Article 
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English 

Entered STN: 13 Jun 1997 
Last Updated on STN: 13 Jun 1997 
A method and software tool to develop patterns of protein families has 
been designed. These patterns are intended for the identification of 
local similarities in arbitrary amino acid sequences with proteins of the 
SWISS-PROT bank. The method is based on the physical, chemical 
and structural properties of amino acids. It assembles a 'best 
set' of elements (a pattern) for a given group of aligned related 
proteins. These elements provide discrimination between proteins of a 
family and representatives of other families or random sequences. The 
method combines the advantages of BLOCKS (automatic generation of multiple 
elements for protein groups), PROSITE (simplicity of element presentation) 
and matrices/profiles (different distinctions between amino acids for 
different positions of aligned sequences) . Using our method, a data bank 
of protein family patterns, PROF-PAT, is produced. This data bank is 
based on the 27 752 amino acid sequences of SWISS-PROT bank release 24. 
The characteristics of patterns of 743 related protein groups are 
described. The results of comparisons of PROF-PAT patterns with the 
proteins of the SWISS-PROT bank are discussed. 
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A biological activity database and a 

physicochemical property database are described. They 
are intended to complement the protein sequence database of 
PIR-International . The Biological Activity Database 
and the Physicochemical Property Database contain 
information regarding the biological activity and the 
physicochemical properties of proteins, respectively. In addition 
they also provide information about wild-type molecules with which 
information concerning variant molecules may be compared. Data on 
artificial variant molecules are stored in the Artificial Variant 
Database which is described separately. 
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^ABSTRACT IS AVAILABLE IN THE ALL AND IALL FORMATS* 
The similarity principle, stating that molecules of similar structure 
behave simila'rly, is an important concept in medicinal chemistry 
. A properly characterized and well understood neighborhood behavior of 
the structural space versus the activity space is 
fundamental for the application of the similarity principle in 
computational chemistry. In this work we focus on the 

utilization of a fuzzy pharmacophore description of molecular similarity 
and specifically on the influence of fuzzy pharmacophore pattern matching 
on the neighborhood behavior (NB) of the similarity scoring scheme. NB is 
defined as a structure activity relationship between 
the intermolecular distances/ dissimilarities in the pharmacophore 
fingerprint structure space and the corresponding 

activity differences, formally seen as intermolecular distances in 

the activity spaces. The latter are defined on hand of a wide variety of 

datasets on pharmacological and physico-chemical 

properties and property profiles. We also investigate the clustering 
behavior (CB) , where the structure-activity 

relationship is described in terms of distance-derived associations of 
compounds into clusters via classical hierarchical clustering procedures. 
The neighborhood behavior and the cluster behavior provide alternative and 
complementary criteria for evaluating the pertinence of a molecular 
similarity metric. 
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^ABSTRACT IS AVAILABLE IN THE ALL AND IALL FORMATS* 
The quantum-chemical calculation of structures of 
organic molecules belonging to 1067 modern pharmaceuticals was 
carried out by semiempirical (AMI , PM3, MNDO, CNDO/2, MINDO/3) and ab 
initio (6-31G) procedures taking into account the hydration effects. Each 
molecule was characterized by 149 topochemical and quantum- 
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chemical descriptors. Basing on combination of multidimensional 
analysis procedures a new method was developed for forecasting the 
biological activity of organic compounds consisting in 

determination of proximity of the molecules on a surface of a potential 
function in the multidimensional space of descriptors (MATRIX 
algorithm) . 
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^ABSTRACT IS AVAILABLE IN THE ALL AND I ALL FORMATS* 
The autocorrelation descriptor is a molecular descriptor encoding both 
molecular structure and physico- 
chemical properties attributed to atoms as a vector. Applications 
include QSAR studies and screening of large databases. Using 
random graphs, we show that the autocorrelation descriptor may contain 
highly redundant information even if the encoded properties are 
independent. We show that "this shortcoming can easily be eliminated by 
centering properties, facilitating subsequent statistical analysis of the 
generated data. 
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^ABSTRACT IS AVAILABLE IN THE ALL AND IALL FORMATS* 
Soil sorption coefficients (K-OC) of 185 non-ionic organic 
heterogeneous pesticides have been studied searching for quantitative 
structure-property relationships (QSPRs). The chemical 
description of pesticide structure has been made in terms of 
some molecular descriptors: count descriptors, topological indices, 
information indices, fragment-based descriptors and weighted holistic 
invariant molecular (WHIM) descriptors; these last are statistical indices 
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describing size, shape, symmetry and atom distribution of molecules in the 
three-dimensional space. Three new topological indices derived from the 
electrotopological state indices of Kier and Hall were proposed. Multiple 
linear regression analysis was performed after previous selection of the 
descriptors mostly correlated to the response by Genetic 
Algorithms. The obtained results confirm the capability of the 
proposed approach to give predictive models for one of the most important 
partition properties, such as soil sorption coefficient (K-OC) . (C) 2000 
Elsevier Science Ltd. All rights reserved. 
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^ABSTRACT IS AVAILABLE IN THE ALL AND I ALL FORMATS* 
a large amount of information, typically generated by 
high-throughput screening, is a very difficult task. To address this 
problem, we have developed binary formal inference-based 
recursive modeling using atom and physico-chemical property class 
pair and torsion descriptors. Recursive partitioning is an exploratory 
technique for identifying structure in data. The implemented 
algorithm utilizes a statistical hypothesis resting, similar to 
Hawkins 1 formal inference-based recursive modeling program, to 
separate a data set into two homogeneous subsets at each splitting 
node. This process is repented recursively until no further 
separation can occur. Our implementation of recursive partitioning differs 
from previously reported approaches by employing a method to extract 
multiple features at each splitting node. The method was 
examined for its ability to distinguish random and real data sets. The 
effect of including a single descriptor and multiple descriptors in the 
splitting descriptor set was also studied. The method was tested using 27 
4 01 National Cancer Institute (NCI) compounds and their pGI50 
(-log(GI (50) ) ) against the NC1-H23 cell line. The analyses show that 
partitioning using multiple descriptors is advantageous in analyzing the 
structure-activity relationship information. 
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^ABSTRACT IS AVAILABLE IN THE ALL AND I ALL FORMATS* 
The goal of this study was to develop a genetic neural network (GNN) 
model to predict the degree of drug transfer into breast milk, depending 
on the molecular structure descriptors, and to compare it with the current 
model. A supervised network with back-propagation learning rule and 
multilayer perceptron (MLP) architecture was used to correlate activity 
with descriptors that were preselected by a genetic algorithm. 
The set of 60 drug compounds and their experimentally derived MIP values 
used in this study were gathered from Literature. A total of 61 calculated 
structural features including constitutional, topological, 
chemical, geometrical and quantum chemical descriptors 

were generated for each of the 60 compounds. The MIP Values were used as 
the ANNs output and calculated molecular descriptors as the inputs. 

The best GNN model with 26 input descriptors is presented, and. the 
chemical significance of the chosen descriptors is discussed. 
Strong correlation of predicted versus experimentally derived M/P values 
(R-2>0.96) for the best ANN model (26-5-5-1) confirms that there is a link 
between structure and MIP values. The strength of the link is measured by 
the quality of the external prediction set. With the RMS error of 0.425 
and a good visual plot, the external prediction set ensures the quality of 
the model . 

Unlike previously reported models, the GNN model described here does 
not require experimental parameters and could potentially provide useful 
prediction of M/P ratio of new potential drugs and reduce the need for 
actual compound synthesis and experimental M/P ratio determination. (C) 
2000 Elsevier Science B.V. All rights reserved. 



L100 ANSWER 32 OF 45 
ACCESSION NUMBER: 
THE GENUINE ARTICLE: 
TITLE: 



COPYRIGHT 2003 
SCI SEARCH 



THOMSON ISI on STN 



AUTHOR: 
CORPORATE 



SOURCE 



SCI SEARCH 
2000:217924 
293DJ 

PHY SEAN : PHYsical SEquence ANalysis for the 
identification of protein domains on the basis of physical 
and chemical properties of amino acids 
Ladunga I (Reprint) 

SMITHKLINE BEECHAM PHARMACEUT, BIOINFORMAT DEPT, KING OF 
PRUSSIA, PA 19406 (Reprint); HUNGARIAN ACAD SCI, RES GRP 
EVOLUTIONARY GENET, H-1051 BUDAPEST, HUNGARY; LORAND 
EOTVOS UNIV, H-1051 BUDAPEST, HUNGARY 
USA; HUNGARY 

BIOINEORMATICS, (DEC 1999) Vol. 15, No. 12, pp. 1028-1038. 
Publisher: OXFORD UNIV PRESS, GREAT CLARENDON ST, OXFORD 
0X2 6DP, ENGLAND. 
ISSN: 1367-4803. 
Article; Journal 
LIFE 
English 
70 

* ABSTRACT IS AVAILABLE IN THE ALL AND I ALL FORMATS* 
Motivation: PHYSEAN predicts protein classes with highly variable 
sequences on the basis of their physical, chemical and 
biological characteristics such as diverse hydrophobicity, 
structural propensity and steric properties. These 
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characteristics, calculated from multiple positions in a sequence, may be 
conserved even between sequences that fail to produce alignments at any 
acceptable level of statistical significance. PHY SEAN complements methods 
that require sequence alignments (BLAST, FASTA, dynamic programming) by 
adding less residue- and position-specific physicochemical 
information on the protein or the domain. 

Results: We predict proteins or their domains like signal peptides 
using physical, chemical, geometric, and biological 

properties of the 20 amino acids. This comprehensive set of properties may 
cover the diagnostic functional and structural aspects 
of a domain or a protein class. We automatically select and weight a ~ 
subset of properties so as to discriminate between, e.g., signal peptides 
and amino-termini of cytosolic proteins with the lowest number of 
incorrect predictions. This optimal selection of properties and their 
weights significantly decreases the number of incorrect predictions as 
compared to any single property or any combination of unweighted 
properties. Weights have been optimized by high-performance linear 
programming models that systematically find the optimal solution from 
among an astronomic number of property/weight combinations. PHYSEAN ' s 
performance is demonstrated by highly accurate predictions of signal 
peptides (the vehicles for protein transport across membranes) and their 
cleavage sites. The results indicate reliable predictions are possible 
even in the lack of sequence conservation using an automated physical and 
chemical analysis of proteins. 
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^ABSTRACT IS AVAILABLE IN THE ALL AND IALL FORMATS* 
AB Important molecular descriptors used for establishing quantitative 

structure-activity relationships are investigated to 

classify similar versus dissimilar peptides . When searching new lead 
structures, synthesizing and testing compounds which are too similar 
wastes time and resources. In contrast, any lead optimization program 
requires the investigation of similar compounds to that lead. Thus, it is 
important to maximize or minimize the structural diversity of peptides to 
design useful compound libraries for lead finding or lead refinement 
pro j ect s . 

If a molecular descriptor is a useful measure of similarity for the 
design of peptide libraries, small differences in this descriptor for a 
pair of molecules should only translate into small 
biological differences. Using this paradigm as a basis for 
descriptor validation, it was possible to rank different molecular 
descriptors. Those physicochemical descriptors are 2D 

fingerprints and five experimentally or theoretically derived principal 
property scales. Some theoretically derived metrics are obtained by 
computing interaction energies or similarity indices on predefined 3D grid 
points using canonical conformations for individual amino acids. The 



Searched by Barb O'Bryen, STIC 308-4291 



Zhou 



09/768686 



Page 32 



resulting 3D data matrices are analyzed using a principal component 
analysis leading to three principal properties for CoMFA (Comparative 
Molecular Field Analysis) or CoMSIA (Comparative Molecular Similarity 
Index Analysis) derived molecular fields. 

The descriptor validation results reveal the applicability of design 
tools on peptide data sets. Experimentally derived descriptors, in 
general, are more acceptable than computationally derived metrics, while 
the latter provide a statistically valid alternative to characterize novel 
building blocks. The CoMSIA metrics perform slighly better than the 
CoMFA-based principal properties, while GRID-based descriptors are always 
less acceptable. 
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^ABSTRACT IS AVAILABLE IN THE ALL AND I ALL FORMATS* 
Here we propose an approach for predicting the activity of functional 
DNA and RNA sites. This approach includes (1) identification of 
context-dependent conformational, physicochemical, and 
statistical properties of sites significant for their functioning; (2) 
development of a model on their basis for predicting site activity from 
its sequence ; and ( 3 ) automatic generation of programs for predicting site 
activity based on these models. This approach has been realized as a 
computer system ACTIVITY, which includes databases of site 
activity as well as conformational, physicochemical, and 

statistical properties of DNA and RNA. ACTIVITY is accessible via Internet 
(http://www.bionet.nsc.ru/SRCG/Activity/) and allows real-time analysis of 
experimental data on functional site activity. We analyzed 70 samples of 
sites involved in various molecular biological 
processes and revealed statistical, conformational, and 
physicochemical properties significant for activity of these 
sites. We also developed methods for predicting site activity from their 
nucleotide sequences. 
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* ABSTRACT IS AVAILABLE IN THE ALL AND I ALL FORMATS* 
Physico-chemical properties of polychlorinated 
biphenyls (PCBs) congeners have been extensively studied searching for 
quantitative structure-property relationships (QSPR) . The 
chemical description of PCBs structure is made in terms 
of WHIM descriptors, which are 3D molecular descriptors taking into 
account size, shape, symmetry and atom distribution of the molecules. The 
regression models have been obtained by optimizing their prediction power 
and by selecting the best subset of descriptors by genetic 
algorithm. The results confirm the capability of this approach to 
give predictive models for important physico-chemical 

properties, such as relative retention time, log K-ow, melting point, 
total surface area, Henry's law constant, solubility, and aqueous activity 
coefficients. (C) 1998 Elsevier Science B.V. All rights reserved. 
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^ABSTRACT IS AVAILABLE IN THE ALL AND I ALL FORMATS* 
Three-dimensional molecular indices (WHIM descriptors), proposed in 
Part 5 [1] are used to search for quantitative structure- 
activity relationships to investigate the physico- 
chemical properties and biological activities of 
different classes of environmental important compounds. 

Chlorobenzenes are studied for their interesting physico- 
chemical properties, e.g., melting and boiling points, solubility, 
lipophilicity (logK(ow)), bioconcentration factor (BCF) , and for toxicity 
(Microtex test and algae). The antagonism of N, N-dimethyl-2- 
halophenethylamines to epinephrine and histamine is successfully modelled 
and compared with other models in the literature. Finally, good QSAR 
models are obtained for modelling the receptor binding affinities (RE) and 
inductions of aryl hydrocarbon hydroxylase (AHH) for some dioxin analogue 
compounds, polyhalogenated aryl derivatives 

All the obtained models confirm the high modelling power of the WHIM 
descriptors . 
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NOVELTY - A database containing molecular descriptors especially 
2D and 3D biological, chemical or physical data is established. 

DETAILED DESCRIPTION - A database containing molecular 
descriptors especially 2D and 3D biological, chemical or physical data is 
established. A model is provided for generating quantitative 
structure property activity relationship (QSPAR) and 

significant descriptors are selected in accordance to their influence to 
the QSPAR. The model is verified by using a quality parameter and the 
process of generation of the relationships is continued until the 
parameter reaches a predetermined value. 

INDEPENDENT CLAIMS are also included for: 

(1) QSPAR generation system; and 

(2) Computer program product storing QSPAR generation instructions. 
USE - For chemical structure/biological activity 

research, especially for generating quantitative structure 

property activity relationship (QSPAR) between structure of 

chemical compounds and their pharmacological activity for prophylaxis and 

for treatment of various diseases. 

ADVANTAGE - The validated QSPAR model efficiently provides true 
relationships between the structure of chemical compounds and their 
pharmacological activity. 

DESCRIPTION OF DRAWING (S) - The figure shows the flow diagram 
illustrating QSPAR generation method. 
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NOVELTY - A molecular design method is provided to construct a chromosome 
with weighting values of probes as genes, to optimize an average weighing 
value of the probes by applying a square method and a genetic 
algorithm alternatively or repeatedly, and analyzing a three 
dimensional quantitative structure activity relation 

so that it can estimate a physiological activity of an unknown chemical 
compound . 

DETAILED DESCRIPTION - The method comprises steps of generating 
probes for calculating probe interaction energy (100), generating initial 
objects by expressing weighting values of the probes as genes (200), 
obtaining a linear coefficient of chromosome by using a square method for 
expressing a relation between the weighted probe interaction energy and 
the physiological activity (300), obtaining the average weighting value of 
the probes, and then obtaining the linear coefficient based the average 
weighting value (400), obtaining a better weighting value of the probe by 
using a genetic algorithm ( 600) , and obtaining the physiological 
activity by using spatial coordinates, a partial charge, and a final 
weighting coefficient of the probes (700) . 
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PRIORITY APPLN . INFO: CN 2002-121283 20020613 

AB . CN 1381166 A UPAB: 20030402 

NOVELTY - An expert system for management and disease and pest prevention 
and elimination of alfalfa is composed of camera unit, intelligent 
controller, executing mechanism and planting the alfalfa in field. The 
growth state of alfalfa is picked up by camera and them compared with the 
management parameters in the database. After inference 

and analysis, the disease is judged and correct operating parametersare 
given out and are sent to the executing mechanism to apply related 
agricultural chemical. It can increase the yield of alfalfa by 
more than 10%. 
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Protein analysis in a biological system involves sampling 
the system after exposing it to a stimulus, treating the 
multiple samples by separation technique and analyzing 
the samples by parallel mass spectrometry. 
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NOVELTY - Analysis of proteins in a biological system involves: 

(a) exposing the system to a stimulus; 

(b) sampling the system at multiple time intervals; 

(c) treating the multiple samples by separation technique to provide 
multiple protein samples; and 

(d) analyzing the multiple samples to determine changes in protein 
abundance as a function of time. 

DETAILED DESCRIPTION - Analysis of proteins in a biological system 
involves : 

(a) exposing the system to a stimulus; 

(b) sampling the system at multiple time intervals; 

(c) treating the multiple samples by separation technique to provide 
multiple protein samples; and 

(d) analyzing the multiple samples to determine changes in protein 
abundance as a function of time. 

The analysis includes directing mass spectral data from a parallel 
array of mass spectrometry systems to a common computing device and 
correlating the mass spectral data as a function of time. The mass 
spectral data is indicative of the identity and the abundance of protein 
in the multiple sample. 

AN INDEPENDENT CLAIM is also included for a system for mass 
spectrometric analysis comprising: 

(i) a parallel sample separation apparatus (A) adapted to separate 
multiple samples in parallel for analysis by mass spectrometry; 

(ii) a parallel array of mass spectrometry systems (B) adapted to 
receive the samples from (A) ; and 

(iii) a common computing device (C) communicating with (A) and (B) . 
(C) is adapted to analyze the mass spectral data from (B) as a function of 
sample identity. 

USE - For analyzing proteins in a biological system (claimed) e.g. a 
proteome, nucleotides or other biological molecules. 

ADVANTAGE - The method achieves the analysis of a large number of 
proteins in an accurate, time-effective manner. The method allows to 
analyze the samples on a time scale governed only by the rate of the 
biological changes to observe and not by the rate at which the mass 
spectrometer performs the analysis . The method also allows one to 
infer the order of interactions between and among proteins without 
any advanced knowledge of pairs of interacting proteins as required by the 
protein interaction experiments. The potential for artif actual and false 
observation of protein interactions occurring in vitro is reduced as all 
protein interactions occur in vivo in their proper subcellular 
compartments. The method provides simultaneously recognition of multiple 
protein interaction pathways and their points of intersection. The method 
determines the time dependent appearance and disappearance of protein in 
normal cells compared to a cell treated with drug or perturbed by a 
disease or other factor. This is highly desirable in selecting alternative 
points of drug action in cases where the drugs have undesired reactions. 
The method not only increases and decreases in the abundance of particular 
proteins over time but also reveals shifts in structural state of those 
proteins with total abundance. The method identifies points at which 
protein modifications have occurred and reports the degree of modification 
of any protein. 
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frequency of HTs in a population, to find correlations between HTs or 
genotype and a clinical outcome and to predict HTs from an genotype for a 
gene, are new. 

DETAILED DESCRIPTION - INDEPENDENT CLAIMS are included for the 
following : 

(1) a method (M) for generating a HT database for a 
population, comprising data elements representative of the HTs for at 
least 1 locus from the individuals (indivs) in the data base; 

(2) a (M) of predicting the presence of a HT pair in an indiv; 

(3) a (M) for identifying a correlation between a HT pair and a 
clinical response to a treatment, or other phenotype Pt . ; 

(4) a (M) for identifying a correlation between a HT pair and a 
susceptibility to a condition or a disease of interest (01), or other Pt . 
(01); 

(5) a (M) of predicting an indivs response to a medical or 
pharmaceutical treatment ; 

(6) a computer implemented (C-I) (M) for generating a gene structure 
screen for display on a display device (DD) ; 

(7) a C-I (M) for generating a HT pair frequency screen for display 

on a DD; 

(8) a C-I (M) for generating a linkage screen for display on a DD; 

(9) a C-I (M) for generating a phylogenetic tree screen for display 
on a DD; 

(10) a C-I (M) for generating a genotype (Gt.) analysis screen for 

display on a DD; 

(11) a (M) of displaying clinical response values of a subject 
population as a function of HT pairs of the indivs in the population; 

(12) a C-I (M) for carrying out a genetic algorithm for 

finding an optimal set of weights to fit a function of polymorphic site 
data to a clinical response measurement; 

(13) a C-I (M) for displaying correlations between clinical outcome 
values for a selected population; 

(14) a (M) for conducting a clinical trial of a treatment protocol 
for a medical condition (01); 

(15) a (M) of inferring Gts. of indiv subjects for a 
selected gene having polymorphic sites; 

(16) a (M) of determining polymorphic sites or sub-HTs that correlate 
with a clinical response or out come (01); 

(17) a (M) of determining polymorphic sites or sub-HTs that correlate 
with a clinical response or outcome (01); 

(18) a computer usable (C-U) medium (Md. ) having computer readable 
(C-R) program code (PC) stored upon it, for causing a computer (Comp.) to 
adjust observed HT pair frequencies within a population group (the HT pair 
frequencies are stored in a C-R database of HT information for a 

gene or gene feature (01)); 

(19) a C-U Md. having C-R PC stored upon it, for causing HT pair 
assignments to be made to an indiv member of a population whose Gt . 
information for a gene feature (01) is stored in a C-R form; 

(20) a C-U Md. having C-R PC stored upon it, for causing a Comp. to 
identify a correlation between a clinical response to a treatment or other 
Pt. and a HT or HT pair present at a candidate locus associated with the 
clinical response or other Pt . ; 

(21) a C-U Md. having C-R PC stored upon it, for causing a Comp. to 
identify a correlation between an indiv' s susceptibility to a condition or 
disease (01) or other Pt . , and a HT or HT pair present at a candidate 
locus associated with the susceptibility to the condition or disease (OI) 
other Pt . (OI) ; 

(22) a C-U Md. having C-R PC stored upon it, for causing a Comp. to 
predict an indivs response to a medical or pharmaceutical treatment based 
on one or more selected HTs or HT pairs of the indiv; 

(23) a C-U Md. having C-R PC stored upon it, for causing a Comp. to 
display a gene's structure and gene features on a display device DD; 

(24) a C-R Md. having C-R PC stored upon it, for causing a Comp. to 
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display on a DD, HT frequency data within a population of indivs, for a 
selected gene or gene feature; 

(25) a C-R Md. having C-R PC stored upon it, for causing a Comp. to 
display on a DD, polymorphic site linkage data for a gene or gene (01); 

(26) a C-R Md. having C-R PC stored upon it, for causing a Comp. to 
display on a DD a phylogenetic tree; 

(27) a C-R Md. having C-R PC stored upon it, for causing a Comp. to 
display a Gt . analysis screen on a DD; 

(28) a C-U Md. having C-R PC stored upon it, for causing a Comp. to 
display clinical response values, or other Pt. data, of a subject 
population as a function of HT pairs of the indivs in the population; 

(29) a C-U Md. having C-R PC stored upon it, for causing a Comp. to 
display on a DD, clinical response values, or other Pt . data, of a subject 
population as a function of HT pairs of the indivs in the population for a 
gene or gene feature (01); 

(30) a C-U Md. having C-R PC stored upon it, for causing a Comp. to 
carry out a genetic algorithm for finding an optimal set of weights to fit 
a function of polymorphic site data for a gene or gene feature (OI ) to a 
clinical response measurement; . 

(31) a C-U Md. having C-R PC stored upon it, for causing a Comp. to 
display on a DD, correlation between clinical outcome values obtained from 
selected clinical outcome measures for a selected population; 

(32) a C-U Md. having C-R PC stored upon it, for causing a Comp. to 
provide information of use in conducting clinical trials of a treatment 
protocol for a medical condition (01); 

(33) a C-U Md. having C-R PC stored upon it, for causing a Comp. to 
infer Gts. of indiv subjects for a selected gene having polymorphic sites; 

(34) C-U media having C-R PC stored upon it, for causing a Comp. to 
determine polymorphic sites or sub-HTs that correlate with a clinical 
response or outcome (01), or other Pt . (01); 

(35) Comps. programmed to carry out the above (Ms) or comprising the 
above Comp . -useable or -readable media, comprising a memory with at least 
1 region for storing Comp. executable PCs and a processor for executing 
the PC stored in the memory; 

(36) a data structure for storing an organizing biological 
information, stored on a C-R Md. and accessible by a processor, which 
comprises a single parent table which is adapted for storing, organizing 
and retrieving a number of genetic features by the relative positional 
relationships between the genetic features; 

(37) a (M) for storing and organizing biological information; and 

(38) a data structure for storing an organizing biological 
information, stored on a C-R Md. and accessible by a processor, which 
comprises a least 2 different fields, one of which included a number of 
genetic features, and the other of which included relative positional 
relationships between the genetic features. 

Note: Further details of the above are given in the specification but 
had to be omitted from this abstract due to insufficient space. 

USE - The methods, computer programs and databases for analyzing and 
make use of gene haplotype HT information, e.g. to determine the frequency 
of HTs in a population, to find correlations between HTs or genotypes and 
a clinical outcome or the effects of a therapeutic intervention and/or to 
predict HTs from an individual's genotype for a gene. 
Dwg. 0/49 
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NOVELTY - Generation of optimal quantitative structure- 
activity relationship (QSAR) among a series of molecules, 

comprising using a molecular hologram molecular structural descriptor for 
each molecule in the series, where each molecule is associated with an 
activity value, is new. 

DETAILED DESCRIPTION - Generation of optimal quantitative 
structure- activity relationship among a series of 
molecules comprising: 

(a) defining a list of values for hologram length and fragment size 

range; 

(b) selecting a value from the list for length L; 

(c) selecting values from the list for fragment size in M-N; 

(d) using selected values of M and N which define a molecular 
hologram molecular structural descriptor for each molecule in the series; 

(e) correlating the molecular hologram molecular structural 
descriptor and activity value of each molecule with all the 
other molecules to obtain a structure-activity 
relationship; 

(f) repeating steps (b)-(e) for all values of L on the list; 

(g) selecting the optimal structural-activity 
relationship based on the statistical correlation values; and 

(h) outputting the selected optimal structure- 
activity relationship for the values of L-N used for the molecular 
hologram generation along with statistical significance measurements 

An INDEPENDENT CLAIM is also included for generating a weighted 
2-Dimensional (2D) fingerprint of a molecule comprising: 

(1) generating a list of all fragments found in the molecule having a 
minimum size of M and maximum size of N including branched and cyclic 
fragments ; 

(2) producing a unique representation of each fragment; 

(3) generating each unique representation of each pseudo-random 
number generated by fragment; 

(4) assigning each fragment to a specific position in the fingerprint 
using operator modulus with the length L and the pseudo-random number; and 

(5) incrementing the value stored at each fragment position for each 
occurrence in the molecule assigned to that position. 

USE - For generating optimal quantitative structure- 
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activity relationship among series of molecules. 

ADVANTAGE - Powerful chemometric techniques are applied to the 
molecular holograms to yield predictive quantitative structure- 
activity models. The process determines the optimal set of 
parameters to use in hologram generation so that the resultant hologram 
yields the optimal validated QSAR model. It provides huge benefits to the 
user and extends the scope of quantitative structure- 
activity relationship (QSAR) modeling to a wider application, e.g. 
CoMFA or Apex-3D. The technique can be automated. 
Dwg. 0/9 
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NOVELTY - Identifying novel nucleic acids molecules encoding a protein of 
interest, using regulatory networks, is new. 

DETAILED DESCRIPTION - Identifying novel nucleic acids molecules 
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encoding a protein of interest, using regulatory networks, is new. The 
method comprises : 

(a) selecting a specific protein from a species involved in a 
regulatory network of interest; 

(b) identifying known proteins that act upstream and downstream of 
the protein, within the regulatory network; 

(c) constructing the regulatory network of interest from the proteins 
identified in (b) ; 

(d) for each identified protein, selecting a domain or motif and 
searching by homology for related proteins in a second species, a related 
protein has a homologous domain or motif; 

(e) producing a regulatory network for the second species, which 
incorporates the identified related proteins; 

(f) comparing the networks of the two species; 

(g) identifying a protein present in only one of the networks; and 

(h) isolating a nucleic acid molecule encoding the protein identified 
in (g) in the species in which it is missing. 

INDEPENDENT CLAIMS are also included for the following: 

(1) identifying the effect of a gene knockout on a regulatory 
pathway, comprising : 

(a) identifying the shortest non-oriented pathway connecting two gene 
products; 

(b) assigning an initial sign value of minus to the knockout since 
the knockout gene is inactive; 

(c) moving along the shortest pathway between the two gene products 
multiplying the sign with the sign of the next gene product in the 
pathway, where minus stands for inhibition and plus stands for induction 
or activation and zero stands for lack of interaction between two proteins 
in the specified direction; and 

(d) determining the final sign at the end of the pathway, where minus 
indicates inhibition and plus indicates induction or activation of the 
pathway; 

(2) identifying a novel nucleic acid molecule encoding a protein of 
interest, comprising: 

(a) selecting a gene of interest and searching a database 
for homologous sequences ; 

(b) aligning the sequences identified in (a); 

(c) constructing a gene tree using the sequence alignment; 

(d) constructing a species tree; 

(e) inputting the species tree and gene tree into an 
algorithm which integrates the species tree and gene tree into a 
reconciled tree; and 

(f) identifying orthologous genes present in one species but missing 
in another; 

(3) identifying a novel gene, comprising : 

(a) defining a motif or domain composition of a gene of interest; 

(b) searching for sequences which correspond to nucleotide sequences 
in an expression sequence tag database or other cDNA 

database using a program such as BLAST and retrieving the 
identified sequences; 

(c) searching additional databases for expressed sequence 

tags containing the domains and motifs characteristic for the gene of 
interest with a hidden Markov model of domains and motifs identified in 
(A); and 

(d) identifying nucleotide sequences comprising the gene of interest; 

(4) extracting information on interactions between biological 
entities from natural-language text data, comprising: 

(a) parsing the text data to determine its grammatical structure; and 

(b) regularizing the parsed text data to form structured word terms; 

and 

(5) a computer system for extracting information on biological 
entities from natural-language text data, comprising: 

(a) means for parsing the natural-language text data; and 
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(b) means for regularizing the parsed text data to form structured 
word terms ♦ 

USE - For identifying novel genes and for natural language processing 
and extraction of relational information associated with genes and 
proteins that are found in genomics journal articles. 

ADVANTAGE - The method allows the rapid retrieval of information from 
literature and manipulation of derived functional data, removing a 
researchers need to perform laborious reading and manual integration of 
research articles . 
Dwg. 0/23 
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NOVELTY - A method for genomic data discovery, comprises discovering 
knowledge, in parallel, about all selected genes in a database of at least 
10 genes (I), and using the acquired knowledge to repeat the procedure 
several times. 

DETAILED DESCRIPTION - INDEPENDENT CLAIMS are also included for the 
following: 

(1) a method for discovering genomic knowledge, comprising 
determining at least one data element (DE), for at least one (I), 
searching at least 50 databases for this DE, and analyzing responses to 
increase knowledge of (I); 

(2) a method of automated knowledge discovery comprising continuously 
operating a cycle that comprises querying a database to receive data, 
drawing inferences from the data to generate knowledge, and re-evaluation 
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of the inferences when the database is modified; 

(3) a method for discovering genomic knowledge by selecting a gene 
token (GT) , determining data requirements for GT, requesting and receiving 
data responsive to these requirements, analyzing the information to 
increase knowledge of GT and repeating the procedure at least 50 times; 

(4) a knowledge discovery system comprising a unit for determining 
data needs and analyzing returned data responsive to these needs, and at 
least 10 adapter units for accessing at least 10 dissimilar data sources 
to provide the data required; 

(5) a method of ranking (I) for a particular application by 
computer-based application, without additional operator input, of 
application-specific ranking rules to many GT; 

(6) a method of genomic information analysis, comprising applying 
inference rules to two models of a biological relationship, 
interrelating different sets of genes or proteins, and applying inference 
rules to the models to infer missing information; and 

(7) an automated method of genomic knowledge discovery by analyzing 
GT to determine required data and either, asking a human expert for data 
or generating by computer, without additional operator input, a work order 
to a laboratory to produce the data. 

USE - The method is used for the development of drugs, cosmetics, 
food additives, pesticides, herbicides and other biologically active 
agents. More generally similar methods can be used to process industrial 
or financial information. 

ADVANTAGE - The method is automated to allow manipulation of more 
information than could be handled by a human operator, i.e. it overcomes 
difficulties associated with scale, updating, errors, heterogeneity and 
complexity of databases. It can be operated continuously to take account 
of changes in knowledge and/or available resources, both external and 
internal, and may include self-monitoring to identify the most dependable 
data sources or to identify/correct errors. 
Dwg. 0/5 
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Database of molecular fragments is prepared by: 

(a) identifying all sequentially attached fragments within a selected 
molecule, 

(b) counting the occurrences of each unique fragment and 

(c) storing information correlating fragment counts with fragment 
identity in computer-readable form. 

Also claimed are: 

(1) data processing system for creating such databases, and 

(2) computer-readable medium on which the databases are 
stored. 

USE - Comparison of fragment counts between a molecule and a 
reference molecule of known activity can be used to predict which compound 
will have this particular activity. The method is especially applied to 
libraries of drugs (e.g. central nervous system drugs), toxins or randomly 
chosen compounds . 

ADVANTAGE - The databases provide a complete and systematic 
classification of function/activity based on specific topological 
characteristics of fragments of small molecules, and is suitable for 
construction of combinatorial libraries covering the whole of chemical 
space or focused on part of it for precise selection of active molecules. 
Dwg . 13a/18 
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